Implements NestingType.Flat#2409
Draft
GarrettBeatty wants to merge 22 commits into
Draft
Conversation
stack-info: PR: #2361, branch: GarrettBeatty/stack/3
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
|
|
||
| COPY bin/publish/ ${LAMBDA_TASK_ROOT} | ||
|
|
||
| ENTRYPOINT ["/var/task/bootstrap"] |
There was a problem hiding this comment.
Pull request overview
This PR implements NestingType.Flat support for ParallelAsync and MapAsync in Amazon.Lambda.DurableExecution, switching per-unit execution to “virtual” child contexts that do not emit per-branch/per-item CONTEXT checkpoints, while still keeping deterministic operation IDs and re-parenting inner operations to the nearest non-virtual ancestor. It also adds unit + integration tests to validate replay behavior and the new checkpoint/payload contract.
Changes:
- Enable Flat nesting by suppressing per-unit child
CONTEXTSTART/SUCCEED/FAIL checkpoints and persisting per-unit results/errors inline on the parent operation payload. - Add support for “virtual” child ID generation so inner ops keep branch/item-scoped IDs but report the non-virtual parent as
ParentId. - Add/extend unit and integration tests validating: no per-unit contexts, correct re-parenting, and correct replay reconstruction from inline payloads.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| Libraries/test/Amazon.Lambda.DurableExecution.Tests/ParallelOperationTests.cs | Updates tests to validate Flat parallel semantics (no branch contexts, inline replay, re-parenting). |
| Libraries/test/Amazon.Lambda.DurableExecution.Tests/MapOperationTests.cs | Updates tests to validate Flat map semantics (no item contexts, inline replay, re-parenting). |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFlatNestingFunction/ParallelFlatNestingFunction.csproj | New integration test function project for Flat parallel. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFlatNestingFunction/Function.cs | New Flat parallel test workflow exercising replay via WaitAsync. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/ParallelFlatNestingFunction/Dockerfile | Container packaging for the new Flat parallel test function. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MapFlatNestingFunction/MapFlatNestingFunction.csproj | New integration test function project for Flat map. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MapFlatNestingFunction/Function.cs | New Flat map test workflow exercising replay via WaitAsync. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/TestFunctions/MapFlatNestingFunction/Dockerfile | Container packaging for the new Flat map test function. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/ParallelFlatNestingTest.cs | New end-to-end AWS integration test asserting Flat parallel history/IDs/parenting. |
| Libraries/test/Amazon.Lambda.DurableExecution.IntegrationTests/MapFlatNestingTest.cs | New end-to-end AWS integration test asserting Flat map history/IDs/parenting. |
| Libraries/src/Amazon.Lambda.DurableExecution/ParallelConfig.cs | Updates docs to describe Flat’s inline result/error behavior. |
| Libraries/src/Amazon.Lambda.DurableExecution/NestingType.cs | Updates enum docs to reflect Flat now being implemented and its semantics. |
| Libraries/src/Amazon.Lambda.DurableExecution/MapConfig.cs | Updates docs to describe Flat’s inline result/error behavior. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/ParallelOperation.cs | Plumbs Flat configuration into concurrent orchestration via isVirtual. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/MapOperation.cs | Plumbs Flat configuration into concurrent orchestration via isVirtual. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/OperationIdGenerator.cs | Adds “virtual child” ID generator that decouples ID prefix from reported ParentId. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/ConcurrentOperation.cs | Implements inline per-unit payload persistence (Flat) and replay reconstruction paths. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/ChildContextOperation.cs | Suppresses per-unit CONTEXT checkpoints when the child context is virtual (Flat). |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/BatchSummary.cs | Extends summary payload to optionally include inline Result / Error for Flat units. |
| Libraries/src/Amazon.Lambda.DurableExecution/Internal/BatchJsonContext.cs | Adds source-gen metadata for ErrorObject in the batch summary payload. |
| Libraries/src/Amazon.Lambda.DurableExecution/DurableContext.cs | Removes Flat “not supported” guard and updates child-context factory to support virtual parenting. |
| Libraries/src/Amazon.Lambda.DurableExecution/CLAUDE.md | Adds repository-specific architecture/testing guidance (documentation only). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+612
to
+634
| // Flat (virtual) units have no child checkpoint — their result/error | ||
| // was recorded inline on this summary. Nested units read from the | ||
| // child's own CONTEXT checkpoint. A unit is "inline" when the summary | ||
| // entry carries a Result/Error, which only Flat writes. | ||
| if (_isVirtual && summaryEntry != null) | ||
| { | ||
| if (status == BatchItemStatus.Succeeded && summaryEntry.Result != null) | ||
| { | ||
| unitResult = DeserializeResult(summaryEntry.Result); | ||
| } | ||
| else if (status == BatchItemStatus.Failed && summaryEntry.Error != null) | ||
| { | ||
| var err = summaryEntry.Error; | ||
| unitError = new ChildContextException(err.ErrorMessage ?? "Unit failed") | ||
| { | ||
| SubType = ChildSubType, | ||
| ErrorType = err.ErrorType, | ||
| ErrorData = err.ErrorData, | ||
| OriginalStackTrace = err.StackTrace | ||
| }; | ||
| } | ||
| } | ||
| else if (status == BatchItemStatus.Succeeded && childOp?.ContextDetails?.Result != null) |
Adds parallel branch execution to the .NET Durable Execution SDK.
ParallelAsync runs N branches concurrently with configurable concurrency
limits and completion policies, returning an IBatchResult<T> with
per-branch status and error information.
Per-branch checkpoint payloads are serialized via the ILambdaSerializer
registered on ILambdaContext.Serializer (typically configured through
LambdaBootstrapBuilder.Create(handler, serializer)), matching the
StepAsync / RunInChildContextAsync pattern. There are no separate
reflection / AOT-safe overload pairs: the AOT story is determined
entirely by which serializer the user registers with the runtime.
Public surface:
- IDurableContext.ParallelAsync<T> (2 overloads: Func[] vs
DurableBranch<T>[])
- DurableBranch<T> record (Name + Func)
- ParallelConfig (MaxConcurrency, CompletionConfig, NestingType)
- CompletionConfig with factories AllSuccessful() / FirstSuccessful() /
AllCompleted(); ToleratedFailureCount / ToleratedFailurePercentage
(validated 0.0-1.0)
- IBatchResult<T> with All / Succeeded / Failed / Started accessors,
GetResults, GetErrors, ThrowIfError, HasFailure, CompletionReason,
count properties
- IBatchItem<T> with Index, Name, Status, Result, Error
- BatchItemStatus { Succeeded, Failed, Started }
- CompletionReason { AllCompleted, MinSuccessfulReached,
FailureToleranceExceeded }
- NestingType (Nested default; Flat throws NotSupportedException - reserved)
- ParallelException (carries IBatchResult; future-subclassable)
Internal:
- ParallelOperation<T> orchestrator dispatches branches with optional
semaphore-bounded concurrency. Each branch runs as a
ChildContextOperation<T> with deterministic ID via
OperationIdGenerator.CreateChild.
- Branch failures aggregated as IBatchItem<T> entries; orchestrator
throws ParallelException only when CompletionConfig signals
FailureToleranceExceeded.
- Parent CONTEXT checkpoint records summary (CompletionReason +
per-branch index/name/status); branch results live on per-branch
CONTEXT checkpoints.
- ExecutionState now thread-safe (lock around reads/writes of
_operations, _visitedOperations, _isReplaying). Required for
concurrent branch replay; affects all operations but no regressions.
- ParallelOperation awaits Task.WhenAll(inFlight) before disposing
the semaphore so cancellation/exception during dispatch lets
in-flight branches settle cleanly.
- Reuses OperationSubTypes.Parallel / OperationSubTypes.ParallelBranch
from Wave 0.
Adds 31 unit tests + 6 integration tests covering CompletionConfig
matrix, MaxConcurrency, FirstSuccessful short-circuit, replay
determinism, mixed-status replay, cancellation, and concurrency
stress.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix tests
change file
Validate CompletionConfig thresholds and honor checkpointed branch names
- Add range validation to CompletionConfig.MinSuccessful (>= 1) and
ToleratedFailureCount (>= 0), matching the existing
ToleratedFailurePercentage setter. Previously zero/negative values
produced nonsensical immediate short-circuits.
- ReconstructFromCheckpoints now uses the branch Name persisted in the
parallel summary instead of always reading the current branch name,
and throws NonDeterministicExecutionException on name drift between
deployments (the prior path silently ignored summaryEntry.Name).
- Correct XML docs for BatchItemStatus.Started / IBatchResult.Started /
CompletionConfig.FirstSuccessful: Started means a branch was not
dispatched before a completion short-circuit fired (or has no
checkpoint on replay), not that it is still running.
Implements IDurableContext.MapAsync, processing a collection in parallel with one child context per item. Mirrors the Python/JS/Java SDKs, where Map is a sibling of Parallel sharing one concurrency engine. - Extract ConcurrentOperation<T> base holding all orchestration, completion, checkpoint, and replay logic; ParallelOperation and MapOperation are thin subclasses supplying only the per-unit (name, func), sub-type labels, and failure-exception factory. - MapConfig defaults CompletionConfig to AllCompleted() (permissive), matching Python/Java Map; intentionally differs from ParallelConfig's AllSuccessful(). Adds ItemNamer; no ItemBatcher (not implemented in any reference SDK). - New MapException so callers can distinguish Map from Parallel failures. - Generalize ParallelSummary/ParallelJsonContext into shared BatchSummary/ BatchJsonContext. - Tests: 24 unit tests (MapOperationTests) + 6 integration functions/tests mirroring the Parallel set. Full suite 325/325 on net8.0 and net10.0.
97d51e0 to
e0b6f5f
Compare
54a24e4 to
e1db9a7
Compare
e0b6f5f to
dcda8da
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue #, if available: #2216
Description of changes:
Summary
Implements
NestingType.FlatforParallelAsyncandMapAsync. Previously both threwNotSupportedExceptionwhenNestingType.Flatwas supplied;Nested(the default) was the only wired mode. This PR wires upFlatfor both operations using an inline-payload strategy that matches the behavior of the Python, JS, and Java durable-execution SDKs.What
NestingType.FlatisWhen a parallel/map operation runs, each branch/item normally executes inside its own child
CONTEXToperation, which emitsSTART+SUCCEED/FAILcheckpoints. For large fan-outs this is a lot of checkpoint writes — roughly one extraCONTEXTop per branch on top of the work each branch actually does.NestingTypecontrols how those branches are represented in the checkpoint graph:Nested(default) — each branch produces a fullCONTEXToperation. Maximum observability in execution traces; more checkpoint operations.Flat— each branch runs in a virtual context that emits noCONTEXTcheckpoint of its own. Per-branch results/errors are recorded inline on the parent parallel/map operation's payload instead. Fewer checkpoints, at the cost of less granular traces.Nestedremains the default, so existing workflows are unaffected and opting in is non-breaking.How it works
A flat branch is a logical scope that owns an ID namespace and a logger scope but is invisible in the checkpoint tree. Four pieces make that work:
Decoupled ID generation (
OperationIdGenerator) — a newCreateVirtualChild(operationId, reportedParentId)separates the hash prefix used to derive inner-operation IDs from theParentIdreported on those operations:hash("{branchOpId}-1"), …), so two sibling branches never collide on inner IDs. The ID space is identical toNested— load-bearing for deterministic replay.ParentId, because the virtual branch emits noCONTEXTcheckpoint for them to reference.Checkpoint suppression (
ChildContextOperation) — anisVirtualflag suppresses the branch'sCONTEXTSTART/SUCCEED/FAILenqueues. The exception still propagates on failure; inner ops (steps/waits) still checkpoint normally, re-parented per (1).Inline result/error payload (
BatchSummary/BatchUnitSummary) — underNested, per-unit results are read back from each child's ownCONTEXTcheckpoint on replay. UnderFlatthere are no child checkpoints, so each unit's serialized result (orErrorObject) is recorded inline on the parent operation'sBatchSummarypayload.ConcurrentOperationpersists this payload even onFAIL(Nested omits it, since it can rebuild from child checkpoints).Replay reconstruction (
ConcurrentOperation.ReconstructFromCheckpoints) — when virtual, per-unit results/errors are read from the inline summary instead of from child operations.This mirrors the Python/Java SDKs' inline-payload approach (the JS SDK re-executes branch bodies on replay instead; inline is the better fit for .NET's cached-replay model).
Tests
Unit tests (
ParallelOperationTests,MapOperationTests) — replaced the two*_NestingTypeFlat_ThrowsNotSupportedtests with Flat behavior coverage: per-branchCONTEXTops suppressed, inner-op re-parenting to the parallel/map op, inline partial-failure surfacing, and replay-from-inline-payload (success + failure).Integration tests (
ParallelFlatNestingTest,MapFlatNestingTest) — end-to-end against the real durable-execution service. Each runs 3 branches/items with a step + durable wait (the wait forces a suspend/resume so the operation genuinely replays) and asserts against service history:CONTEXTop (the parent) — no per-branch/item contexts,Event.ParentId),All 331 unit tests pass on net8.0 and net10.0.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.